SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching.

نویسندگان

  • Ilya Y Zhbannikov
  • Samuel S Hunter
  • Matthew L Settles
  • James A Foster
چکیده

With the advent of Next-Generation (NG) sequencing, it has become possible to sequence a entire genomes quickly and inexpensively. However, in some experiments one only needs to extract and assembly a portion of the sequence reads, for example when performing transcriptome studies, sequencing mitochondrial genomes, or characterizing exomes. With the raw DNA-library of a complete genome it would appear to be a trivial problem to identify reads of interest. But it is not always easy to incorporate well-known tools such as BLAST, BLAT, Bowtie, and SOAP directly into a bioinformatics pipelines before the assembly stage, either due to incompatibility with the assembler's file inputs, or because it is desirable to incorporate information that must be extracted separately. For example, in order to incorporate flowgrams from a Roche 454 sequencer into the Newbler assembler it is necessary to first extract them from the original SFF files. We present SlopMap, a bioinformatics software utility that allows quickly identification similar to the provided reference reads from either Roche 454 or Illumnia DNA library. With simple and intuitive command-line interface along with file output formats compatible to assembly programs, SlopMap can be directly embedded to biological data processing pipeline without any additional programming work. In addition, SlopMap preserves flowgram information needed for Roche 454 assembler.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact Mixed Integer Programming for Integrated Scheduling and Process Planning in Flexible Environment

This paper presented a mixed integer programming for integrated scheduling and process planning. The presented process plan included some orders with precedence relations similar to Multiple Traveling Salesman Problem (MTSP), which was categorized as an NP-hard problem. These types of problems are also called advanced planning because of simultaneously determining the appropriate sequence and m...

متن کامل

Application of Support Vector Machine Regression for Predicting Critical Responses of Flexible Pavements

This paper aims to assess the application of Support Vector Machine (SVM) regression in order to analysis flexible pavements. To this end, 10000 Four-layer flexible pavement sections consisted of asphalt concrete layer, granular base layer, granular subbase layer, and subgrade soil were analyzed under the effect of standard axle loading using multi-layered elastic theory and pavement critical r...

متن کامل

QuicK-mer: A rapid paralog sensitive CNV detection pipeline

QuicK-mer is a unified pipeline for estimating genome copy-number from high-throughput Illumina sequencing data. QuicK-mer utilizes the Jellyfish application to efficiently tabulate counts of predefined sets of k-mers. The program performs GC-normalization using defined control regions and reports paralog-specific estimates of copy-number suitable for downstream analysis. The package is freely ...

متن کامل

Application of Artificial Neural Networks for Analysis of Flexible Pavements under Static Loading of Standard Axle

In this study, an artificial neural network was developed in order to analyze flexible pavement structure and determine its critical responses under the influence of standard axle loading. In doing so, more than 10000 four-layered flexible pavement sections composed of asphalt concrete layer, base layer, subbase layer, and subgrade soil were analyzed under the impact of standard axle loading. P...

متن کامل

Neptune: A Tool for Rapid Microbial Genomic Signature Discovery

Neptune locates genomic signatures using an exact k -mer matching strategy while accommodating k -mer mismatches. The software identifies sequences that are sufficiently represented within “inclusion targets” and sufficiently absent from “exclusion targets”. The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. We have evaluated Neptune on L...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of data mining in genomics & proteomics

دوره 4 3  شماره 

صفحات  -

تاریخ انتشار 2013